This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.

Today, we’re going to go through some examples and code for visualizations, including more interactive charts. A lot of this came from the websites:

https://r-graph-gallery.com/interactive-charts.html

https://www.data-to-viz.com

Set up

df <- read_excel('cleaned_data_Final.xlsx')

print(head(df))

Renaming

When creating tables and visualizations, recommend changing your variable labels and numeric values to give them meaning. So instead of my table saying TBI severity “1,2,3” it would read “mild, moderate, and severe” - this makes it more meaningful and easier for outside audiences to understanding

So, I create a new data frame called df_demo and change it there, keeping my original data frame intact.

df_demo <- df
# level naming for categorical variables
df_demo$gender <- factor(df_demo$gender,
                   levels = c(1,2,3),
                   labels = c("Male", "Female", "Nonbinary"))

df_demo$work_current <- factor(df_demo$work_current,
                   levels = c(1,0),
                   labels = c("Yes", "No"))

df_demo$severity <- factor(df_demo$severity,
                   levels = c(2,3),
                   labels = c("Moderate", "Severe"))

df_demo$mech_injury <- factor(df_demo$mech_injury,
                   levels = c(1,2,3,4,5),
                   labels = c("Fall", "MVC", "Sports", "Violence", "Pedestrian struck"))

df_demo$income <- factor(df_demo$income,
                   levels = c(1,2,3),
                   labels = c("<52K", "52K-156K", ">156K"))

df_demo$marital_status <- factor(df_demo$marital_status,
                                 levels = c(1, 2, 3, 4),
                                 labels = c("Single", "Married", "Divorced", "Widowed"))

Now I have two data frames, the original “df” which is still numeric and df_demo which is character

Tables

PPF_table <- df_demo %>%
  subset(., select = c(phys_health_index, emo_health_index, tbiqol_genconcern_tscore, bfi_extraversion, bfi_agreeable, bfi_consciousness, bfi_neuroticism, bfi_openness, income, marital_status, spstotal, frsbe_exec, frsbe_disinhib, frsbe_apathy, frsbe_total, severity)) %>%
  tbl_summary(
    missing = "no",
    by = severity,
    type = list(
      c(phys_health_index, emo_health_index, tbiqol_genconcern_tscore, bfi_extraversion, bfi_agreeable, bfi_consciousness, bfi_neuroticism, bfi_openness) ~ "continuous",
      c(severity, income) ~ "categorical"
    ),
    statistic = list(all_continuous() ~ "{mean} ({sd})", all_categorical() ~ "{n} ({p}%)"),
    label = list(
      phys_health_index ~ "Physical Health Index",
      emo_health_index ~ "Emotional Health Index",
      tbiqol_genconcern_tscore ~ "General Cognition",
      bfi_extraversion ~ "Extraversion", 
      bfi_agreeable ~ "Agreeable", 
      bfi_consciousness ~ "Consciousness", 
      bfi_neuroticism ~ "Neuroticism", 
      bfi_openness ~ "Openness",
      income ~ "Annual household income",
      marital_status ~ "Marital status",
      spstotal ~ "Social Support",
      frsbe_exec ~ "Executive function",
      frsbe_disinhib ~ "Disinhibition",
      frsbe_apathy ~ "Apathy",
      frsbe_total ~ "Total score",
      severity ~ "Severity of Injury"
    )
  ) %>%
  add_p(
    test = list(all_continuous() ~ "t.test", all_categorical() ~ "chisq.test"),
    pvalue_fun = ~style_pvalue(.x, digits = 2)
  ) %>%
  add_n()

print(PPF_table)
Characteristic N Moderate, N = 381 Severe, N = 561 p-value2
Physical Health Index 94 91 (14) 96 (13) 0.077
Emotional Health Index 94 97 (12) 101 (15) 0.13
General Cognition 94 36 (8) 36 (9) 0.68
Extraversion 94 7.16 (2.47) 6.89 (2.31) 0.60
Agreeable 94 7.11 (1.91) 7.11 (2.06)
0.99
Consciousness 94 8.16 (1.52) 7.75 (1.92) 0.25
Neuroticism 94 6.47 (2.17) 6.21 (2.61) 0.60
Openness 94 8.53 (2.06) 7.25 (1.92) 0.003
Annual household income 94

0.69
    <52K
12 (32%) 20 (36%)
    52K-156K
18 (47%) 28 (50%)
    >156K
8 (21%) 8 (14%)
Marital status 94

0.008
    Single
10 (26%) 32 (57%)
    Married
22 (58%) 16 (29%)
    Divorced
6 (16%) 8 (14%)
    Widowed
0 (0%) 0 (0%)
Social Support 94 84 (10) 76 (11) <0.001
Executive function 94 41 (9) 43 (11) 0.54
Disinhibition 94 32 (6) 34 (6) 0.31
Apathy 94 33 (8) 34 (9) 0.42
Total score 94 106 (19) 110 (22) 0.34
1 Mean (SD); n (%)
2 Welch Two Sample t-test; Pearson’s Chi-squared test

Then you can save it to your working drive as a word document. From that word document, you can change the font, layout without having to transpose the data/results.

#to save table to word doc
library(gt)
library(gtsummary)

gt_PPF_table <- as_gt(PPF_table)
   gtsave(gt_PPF_table, filename = "predictive_variables_table.docx")

Corrplots

Gahlot_plot

library(ggplot2)
library(reshape2)
library(Hmisc)
#Rename Variables in new dataset
df2 <- read_excel('cleaned_data_Final.xlsx') # Again, I save as a different data frame so as not to mess up the original for later analysis

# Rename multiple variables
df2 <- df2 %>%
  rename(Global = acsg_retain,
         IADL = acsi_retain,
         Leisure = acsl_retain,
         Fitness = acsf_retain,
         Social = acss_retain,
         Extraversion = bfi_extraversion,
         Agreeable = bfi_agreeable,
         Consciousness =bfi_consciousness,
         Neuroticism = bfi_neuroticism,
         Openness = bfi_openness,
         Apathy = frsbe_apathy,
         ExecFunc = frsbe_exec,
         Disinhibition = frsbe_disinhib,
         Total = frsbe_total,
         SocialSupport = spstotal,
         Communication = tbiqol_comm_tscore,
         ExecFuncQOL = tbiqol_execfunc_tscore, 
         GeneralCognition = tbiqol_genconcern_tscore,
         UpperExtremity = tbiqol_ue_tscore, 
         Fatigue = tbiqol_fatigue_tscore, 
         Mobility = tbiqol_mobility_tscore, 
         Headache = tbiqol_headache_tscore,
         Pain = tbiqol_pain_tscore,
         Anger = tbiqol_anger_tscore,
         PositiveAffect = tbiqol_posaffect_tscore,
         Age = age_current,
         Education = edu,
         Work = work_current,
         SubstanceUse = substance,
         Anxiety = tbiqol_anxiety_tscore, 
         Depression = tbiqol_depression_tscore, 
         Grief = tbiqol_grief_tscore, 
         TraitResilience = tbiqol_resilience_tscore,  
         SelfEsteem = tbiqol_selfesteem_tscore, 
         Stigma = tbiqol_stigma_tscore,
         TimeSinceInjury = time_injury,
         MaritalStatus = marital_status,
         SocialSupport = spstotal,
         HouseholdSize = house_size,
         PhysicalHealth = phys_health_index,
         EmotionalHealth = emo_health_index)
QOL_variables <- c("Global", "IADL",  "Leisure", "Fitness", "Social", "Anger", "Anxiety", "Depression", "Grief", "Resilience", "SelfEsteem", "Stigma", "TraitResilience", "PositiveAffect", "Communication", "GeneralCognition", "ExecFuncQOL", "UpperExtremity", "Fatigue", "Mobility", "Headache", "Pain")

# Ensure selected variables are in the dataframe
QOL_variables <- intersect(QOL_variables, colnames(df2))

# Extract relevant data
QOL_df2 <- df2[, QOL_variables]

# Calculate correlation matrix
cor_matrix <- rcorr(as.matrix(QOL_df2), type = "spearman")$r
cor_matrix[upper.tri(cor_matrix)] <- NA
p_matrix <- rcorr(as.matrix(QOL_df2), type = "spearman")$P
p_matrix[is.na(p_matrix)] <- .0000001
p_matrix[upper.tri(p_matrix)] <- NA

# Melt the correlation matrix for ggplot
melted_cor <- melt(cor_matrix, na.rm = TRUE)
melted_p <- melt(p_matrix, na.rm = TRUE)

melted_cor$p <- melted_p$value
melted_cor$psig <- ""
melted_cor$psig[melted_cor$p < .05] <- "*"
melted_cor$psig[melted_cor$p < .01] <- "**"
melted_cor$psig[melted_cor$p < .001] <- "***"

# Create a heatmap using ggplot2
p <- ggplot(melted_cor, aes(Var1, Var2, fill = value)) +
  geom_tile(color = "white") +
  geom_text(aes(label = round(value, 2)), vjust = 1, size = 6, family = "Times New Roman") +  # Adjust size and font
  geom_text(aes(label = psig), vjust = .25, size = 6, family = "Times New Roman") +  # Adjust size and font
  scale_fill_gradient2(low = "purple", mid = "white", high = "orange", 
                       midpoint = 0, limit = c(-1, 1), space = "Lab",
                       name = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 20, family = "Times New Roman"),  # Adjust size and font
        axis.text.y = element_text(size = 20, family = "Times New Roman"),  # Adjust size and font
        axis.title = element_text(size = 14, family = "Times New Roman"),
        axis.ticks = element_line(linewidth = 1), 
        plot.title = element_text(size = 16, family = "Times New Roman"),  # Add title font
        plot.caption = element_text(size = 14, family = "Times New Roman")) +  # Add caption font
  labs(caption = "<.05 = *, <.01 = **, <.001 = ***") +
  xlab("") +
  ylab("") +
  ggtitle("Correlation Matrix Personal Protective Factors with ACS Variables")


# Save the plot to your working directory
ggsave("correlation_matrix_plot.png", plot = p, width = 15, height = 15)

#Show plot
print(p)

### Function for plot

# Define the function to create and display the correlation heatmap
create_corr_heatmap <- function(df, variables, plot_title) {
  
  # Ensure selected variables are in the dataframe
  selected_vars <- intersect(variables, colnames(df))
  
  # Extract relevant data
  selected_df <- df[, selected_vars]
  
  # Calculate correlation matrix
  cor_matrix <- rcorr(as.matrix(selected_df), type = "spearman")$r
  cor_matrix[upper.tri(cor_matrix)] <- NA
  p_matrix <- rcorr(as.matrix(selected_df), type = "spearman")$P
  p_matrix[is.na(p_matrix)] <- .0000001
  p_matrix[upper.tri(p_matrix)] <- NA
  
  # Melt the correlation matrix for ggplot
  melted_cor <- melt(cor_matrix, na.rm = TRUE)
  melted_p <- melt(p_matrix, na.rm = TRUE)
  
  melted_cor$p <- melted_p$value
  melted_cor$psig <- ""
  melted_cor$psig[melted_cor$p < .05] <- "*"
  melted_cor$psig[melted_cor$p < .01] <- "**"
  melted_cor$psig[melted_cor$p < .001] <- "***"
  
  # Create a heatmap using ggplot2
  p <- ggplot(melted_cor, aes(Var1, Var2, fill = value)) +
    geom_tile(color = "white") +
    geom_text(aes(label = round(value, 2)), vjust = 1, size = 6, family = "Times New Roman") +  # Adjust size and font
    geom_text(aes(label = psig), vjust = .25, size = 6, family = "Times New Roman") +  # Adjust size and font
    scale_fill_gradient2(low = "purple", mid = "white", high = "orange", 
                         midpoint = 0, limit = c(-1, 1), space = "Lab",
                         name = "Correlation") +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 20, family = "Times New Roman"),  # Adjust size and font
          axis.text.y = element_text(size = 20, family = "Times New Roman"),  # Adjust size and font
          axis.title = element_text(size = 14, family = "Times New Roman"),
          axis.ticks = element_line(linewidth = 1),  # Replace 'size' with 'linewidth'
          plot.title = element_text(size = 16, family = "Times New Roman"),  # Add title font
          plot.caption = element_text(size = 14, family = "Times New Roman")) +  # Add caption font
    labs(caption = "<.05 = *, <.01 = **, <.001 = ***") +
    xlab("") +
    ylab("") +
    ggtitle(plot_title)
  
  # Display the plot
  print(p)
}

Then you can create multiple heatmaps for different variables without retyping everything

create_corr_heatmap(df2, c("Global", "IADL", "Leisure", "Fitness", "Social", "Age", "TimeSinceInjury", "MaritalStatus", "SocialSupport", "HouseholdSize"), "Corr Matrix for Environmental Supports")

Scatterplots

# Install hrbrthemes package if you haven't already
#install.packages("hrbrthemes")

# Load the package
library(hrbrthemes)
df %>%
  ggplot( aes(x=frsbe_apathy, y=acss_retain)) +
    geom_point(color="#69b3a2", alpha=0.6) +
    ggtitle("Relationship between Apathy and Social Re-engagement after TBI") +
    theme_ipsum() +
    theme(
      plot.title = element_text(size=12)
    ) +
    ylab('Social Re-engagement') +
    xlab('Apathy Score')

Scatterplots are sometimes supported by marginal distributions. It indeed adds insight to the graphic, revealing the distribution of both variables:

library(ggExtra)

# create a ggplot2 scatterplot
p <- df %>%
  ggplot( aes(x=tbiqol_grief_tscore, y=tbiqol_depression_tscore)) +
    geom_point(color="#69b3a2", alpha=0.8) +
    theme_ipsum() +
    theme(
      legend.position="none"
    )

# add marginal histograms
ggExtra::ggMarginal(p, type = "histogram", color="grey")

But what if I wanted each subscale of the FrSBe on there…

# Load necessary libraries
library(ggplot2)
library(tidyr)
library(dplyr)

# Convert to long format
df_long <- df %>%
  pivot_longer(cols = c(frsbe_apathy, frsbe_exec, frsbe_disinhib),
               names_to = "frsbe_type", values_to = "frsbe_value")

# Plot the data
ggplot(df_long, aes(x = frsbe_value, y = acss_retain, color = frsbe_type)) +
  geom_point(alpha = 0.6) +
  scale_color_manual(values = c("frsbe_apathy" = "#69b3a2", 
                                "frsbe_exec" = "#404080", 
                                "frsbe_disinhib" = "#e38900")) +
  ggtitle("Relationship between Apathy, Executive Function, and Disinhibition with Social Re-engagement") +
  ylab("Social Re-engagement") +
  xlab("Frontal Systems Behavior Subscales") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 12),
    legend.title = element_blank()
  )

What if I wanted to just look at Apathy again but split based on severity of injury?

library(ggplot2)
library(hrbrthemes)

# Plot the relationship between apathy and social re-engagement
df %>%
  ggplot(aes(x = frsbe_apathy, y = acss_retain, color = severity)) +  # Map color to severity
  geom_point(alpha = 0.6) +  # Use alpha for point transparency
  ggtitle("Relationship between Apathy and Social Re-engagement after TBI") +
  theme_ipsum() +  # Ensure this is loaded from hrbrthemes
  theme(
    plot.title = element_text(size = 12)
  ) +
  ylab('Social Re-engagement') +
  xlab('Apathy Score') +
  scale_color_gradient(low = "#69b3a2", high = "#e38900")  # Customize the color gradient for severity

Interactive

Interactivity allows us to zoom on a specific part of the graphic to detect more precise pattern. It also allows us to hover dots to get more information about them, like below:

library(ggplot2)
library(hrbrthemes)
library(plotly)

# Create the interactive scatter plot
interactive_scatter <- df_demo %>%
  mutate(text = paste("Apathy Score: ", frsbe_apathy, "\nSocial Re-engagement: ", acss_retain)) %>% #This is what will show when you hover over a plot
  ggplot(aes(x = frsbe_apathy, y = acss_retain, text = text)) +
  geom_point(aes(color = severity), alpha = 0.6) +  # Color points based on severity
  ggtitle("Relationship between Apathy and Social Re-engagement after TBI") +
  theme_ipsum() +  
  theme(
    plot.title = element_text(size = 12)
  ) +
  ylab('Social Re-engagement') +
  xlab('Apathy Score')

# Make the plot interactive with plotly
ggplotly(interactive_scatter, tooltip = "text")

Saving html widget

#packages
#install.packages("htmlwidgets")
library(htmlwidgets)
# save as an html widget
#saveWidget(interactive_scatter, "Interactive_ScatterPlot.html", selfcontained = TRUE)

OR

library(htmltools)
# save_html(interactive_scatter, file = "interactive_scatter.html")
# widget_plot <- as_widget(interactive_scatter)

# Now save it as an HTML file
# saveWidget(widget_plot, "Interactive_ScatterPlot.html", selfcontained = TRUE)

Bubble plot

A bubble plot is a scatterplot where a third dimension is added: the value of an additional numeric variable is represented through the size of the dots.You need 3 numerical variables as input: one is represented by the X axis, one by the Y axis, and one by the dot size.

In this example, I’m going to look at the relationship between grief and depression with the dot size related the person’s current engagement in activities

library(tidyverse)
library(hrbrthemes)
library(viridis)
library(gridExtra)
library(ggrepel)
library(plotly)
df$age_current<- as.factor(df$age_current)
df %>%
  arrange(desc(acsg_curr)) %>%
  ggplot(aes(x = tbiqol_depression_tscore, 
             y = tbiqol_grief_tscore, 
             size = acsg_curr, 
             color = age_current)) +  # You can replace 'severity' with another variable if needed
  geom_point(alpha = 0.7) +
  scale_size(range = c(1.4, 19), name = "Current Engagement Score") +  # Adjust size range as needed
  scale_color_viridis(discrete = TRUE) +  # Change to your desired color scale
  theme_ipsum() +
  theme(legend.position = "bottom") +
  ggtitle("Bubble Plot of Depression vs. Grief T-scores") +
  xlab("Depression T-Score") +
  ylab("Grief T-Score")

Interactive

library(ggplot2)
library(hrbrthemes)
library(viridis)
library(plotly)

# Ensure severity is a factor
df$severity <- as.factor(df$severity)

# Create the bubble plot
p <- df %>%
  ggplot(aes(x = tbiqol_depression_tscore, 
             y = tbiqol_grief_tscore, 
             size = acsg_curr, 
             color = age_current, 
             text = paste("Age:", age_current, 
                          "<br>Depression T-Score:", tbiqol_depression_tscore, 
                          "<br>Grief T-Score:", tbiqol_grief_tscore,
                          "<br>Current Engagement Score:", acsg_curr))) + 
  geom_point(alpha = 0.7) +
  scale_size(range = c(1.4, 19), name = "Current Engagement Score") + 
  scale_color_viridis(discrete = TRUE) + 
  theme_ipsum() +
  theme(legend.position = "bottom") +
  ggtitle("Bubble Plot of Depression vs. Grief T-scores") +
  xlab("Depression T-Score") +
  ylab("Grief T-Score")

# Convert to interactive plot
ggplotly(p, tooltip = "text")

Interactive Heatmaps

# Libraries
library(tidyverse)
library(hrbrthemes)
library(viridis)
library(plotly)
# d3heatmap is not on CRAN yet, but can be found here: https://github.com/talgalili/d3heatmap
#To load this follow these steps
# install.packages("devtools")
library(devtools)
# install_github("talgalili/d3heatmap")
library(d3heatmap)
# Details and variations can be found here: https://www.data-to-viz.com/graph/heatmap.html

# Load data
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/multivariate.csv", header = T, sep = ";")
colnames(data) <- gsub("\\.", " ", colnames(data))

# Select a few country
data <- data %>%
  filter(Country %in% c("France", "Sweden", "Italy", "Spain", "England", "Portugal", "Greece", "Peru", "Chile", "Brazil", "Argentina", "Bolivia", "Venezuela", "Australia", "New Zealand", "Fiji", "China", "India", "Thailand", "Afghanistan", "Bangladesh", "United States of America", "Canada", "Burundi", "Angola", "Kenya", "Togo")) %>%
  arrange(Country) %>%
  mutate(Country = factor(Country, Country))

# Matrix format
mat <- data
rownames(mat) <- mat[,1]
mat <- mat %>% dplyr::select(-Country, -Group, -Continent)
mat <- as.matrix(mat)

# Heatmap
d3heatmap(mat, scale="column", dendrogram = "none", width="800px", height="800px", colors = "Blues")
library(heatmaply)
p <- heatmaply(mat,
        dendrogram = "none",
        xlab = "", ylab = "",
        main = "",
        scale = "column",
        margins = c(60,100,40,20),
        grid_color = "white",
        grid_width = 0.00001,
        titleX = FALSE,
        hide_colorbar = TRUE,
        branches_lwd = 0.1,
        label_names = c("Country", "Feature:", "Value"),
        fontsize_row = 5, fontsize_col = 5,
        labCol = colnames(mat),
        labRow = rownames(mat),
        heatmap_layers = theme(axis.line = element_blank())
        )

Stacked area: interactive

I don’t work much in time series, so here’s a sample of what it could look like and the code

# Libraries
library(ggplot2)
library(dplyr)
library(babynames) #just for the data for analysis, not needed for the code
library(viridis)
library(hrbrthemes)
library(plotly)

# Load dataset from github
data <- babynames %>% 
  filter(name %in% c("Ashley", "Amanda", "Jessica",    "Patricia", "Linda", "Deborah",   "Dorothy", "Betty", "Helen")) %>%
  filter(sex=="F")

# Plot
p <- data %>% 
  ggplot( aes(x=year, y=n, fill=name, text=name)) +
    geom_area( ) +
    scale_fill_viridis(discrete = TRUE) +
    theme(legend.position="none") +
    ggtitle("Popularity of American names in the previous 30 years") +
    theme_ipsum() +
    theme(legend.position="none")

# Turn it interactive
p <- ggplotly(p, tooltip="text")
p
# save the widget
# library(htmlwidgets)
# saveWidget(p, file=paste0( getwd(), "/HtmlWidget/ggplotlyStackedareachart.html"))

Some general thoughts:

  • R is finicky when it comes to packages. Often packages that are loaded early are masked by packages loaded late. If you had a code that was working, but now is not– check your packages. For me, d3heatmap package maskedmy print() function and I had to “turn off” d3heatmap to get my code to work.

  • There are a lot of time series visuals that I didn’t review as I have worked more in cross sectional data.. explore data-to-viz.com to check them out.

  • Python. Seaborn library (for me) is far superior with creating side by side visuals for looking at relationships. Check it out… Start a new Google Colab and enter the code below:

import seaborn as sns df = sns.load_dataset(‘iris’) import matplotlib.pyplot as plt

Basic correlogram >sns_plot = sns.pairplot(df) sns_plot.savefig(“IMG/correlogram1.png”)

Or make it a regression >sns_plot = sns.pairplot(df, kind=“reg”) sns_plot.savefig(“IMG/correlogram2.png”)

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.